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A 1 . 2 Michael B achmann 

1 Introduction 

For a system under thermal conditions in a heat bath with temperature T, the dynamics of each 
of the system particles is influenced by interactions with the heat-bath particles. If quantum ef- 
fects are negligible (what we will assume in the following), the classical motion of any system 
particle looks erratic; the particle follows a stochastic path. The system can "gain" energy from 
the heat bath by these collisions (which are typically more generally called "thermal fluctua- 
tions") or "lose" energy by friction effects (dissipation). The total energy of the coupled system 
of heat bath and particles is a conserved quantity, i.e., fluctuation and dissipation refer to the 
energetic exchange between heat bath and system particles only. Consequently, the coupled 
system is represented by a microcanonical ensemble, whereas the particle system is in this case 
represented by a canonical ensemble: The energy of the particle system is not a constant of 
motion. Provided heat bath and system are in thermal equilibrium, i.e., heat-bath and system 
temperature are identical, fluctuations and dissipation balance each other. This is the essence of 
the celebrated fluctuation-dissipation theorem [Tj. In equilibrium, only the statistical mean of 
the particle system energy is constant in time. 

This canonical behavior of the system particles is not accounted for by standard Newtonian 
dynamics (where the system energy is considered to be a constant of motion). In order to 
perform molecular dynamics (MD) simulations of the system under the influence of thermal 
fluctuations, the coupling of the system to the heat bath is required. This is provided by a 
thermostat, i.e., by extending the equations of motion by additional heat-bath coupling degrees 
of freedom f2\. The introduction of thermostats into the dynamics is a notorious problem in 
MD and it cannot be considered to be solved satisfactorily to date H. In order to take into 
consideration the stochastic nature of any particle trajectory in the heat bath, a typical approach 
is to introduce random forces into the dynamics. These forces represent the collisions of system 
and heat-bath particles on the basis of the fluctuation-dissipation theorem [1]. 
Unfortunately, MD simulations of complex systems on microscopic and mesoscopic scales are 
extremely slow, even on the largest available computers. A prominent example is the folding of 
proteins with natural time scales of milliseconds to seconds. It is currently still impossible to 
simulate folding events of bioproteins under realistic conditions, since even longest MD runs are 
hardly capable of generating single trajectories of more than a few microseconds. Consequently, 
if the intrinsic time scale of a realistic model exceeds the time scale of an MD simulation of this 
model, MD cannot seriously be applied in these cases. 

However, many interesting questions do not require to consider the intrinsic dynamics of the 
system explicitly. This regards, e.g., equilibrium thermodynamics, which includes all relevant 
phenomena of cooperativity - the collective original source for the occurrence of phase transi- 
tions. Stability of all matter, independently whether soft or solid, requires fundamental ordering 
principles. We are far away from having understood the general physical properties of tran- 
sition processes that separate, e.g., ordered and disordered phases, crystals and liquids, glassy 
and globular polymers, native and intermediate protein folds, ferromagnetic and paramagnetic 
states of metals, Bose-Einstein condensates and bosonic gases, etc. Meanwhile, the history of 
research of collective or critical phenomena has already lasted for more than hundred years and 
the universality hypothesis has already been known for several decades [4J. Though, no com- 
plete theory exists which is capable relating to each other phenomena such as protein folding 
(unfolding) and freezing (melting) of solid matter. The reason is that the first process is domi- 
nated by finite-size effects, whereas the latter seems to be a macroscopic "bulk" phenomenon. 
However, although doubtlessly associated to different length scales which differ by orders of 
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magnitude, both examples are based on cooperativity, i.e., the collective multi-body interplay 
of a large number of atoms. Precise theoretical analyses are extremely difficult, even more, 
if several attractive and repulsive interactions compete with each other and if the system does 
not possess any obvious internal symmetries (which is particularly apparent for "glassy" het- 
eropolymers like proteins). On the experimental side, the situation has not been much better 
as the resolution of the data often did not allow an in-depth analysis of the simultaneous mi- 
croscopic effects accompanying cooperative phenomena. This has dramatically improved by 
novel experimental techniques enabling to measure the response of the system to local manip- 
ulations, giving insight in the mesoscopic and macroscopic multi-body effects upon activation. 
On the other hand, a systematic understanding requires a theoretical basis. The relevant physical 
forces have been known for a long time, but the efficient combination of technical and algorith- 
mic prerequisites has been missing until recently. The general understanding of cooperativity 
in complex systems as a statistical effect, governed by a multitude of forces acting on different 
energy and length scales, requires the study of the interplay of entropy and energy. The key to 
this is currently only provided by Monte Carlo computer simulations O. 



2 Conventional Markov-chain Monte Carlo sampling 
2.1 Ergodicity and finite time series 

The general idea behind all Monte Carlo methodologies is to provide an efficient stochastic 
sampling of the configurational or conformational phase space or parts of it with the objec- 
tive to obtain reasonable approximations for statistical quantities such as expectation values, 
probabilities, fluctuations, correlation functions, densities of states, etc. 

A given system conformation (e.g., the geometric structure of a molecule) X is locally or glob- 
ally modified to yield a conformation X'. This update or "move" is then accepted with the 
transition probability t(X — > X'). Frequently used updates for polymer models are, for exam- 
ple, random translational changes of single monomer positions, bond angle modifications, or 
rotations about covalent bond axes. More global updates consist of combined local updates, 
which can be necessary to satisfy constraints such as fixed bond lengths or simply to improve 
efficiency. It is, however, a necessary condition for correct statistical sampling that Monte Carlo 
moves are ergodic, i.e., the chosen set of moves must, in principle, guarantee to reach any con- 
formation out of any other conformation. Since this is often hard to prove and an insufficient 
choice of move sets can result in systematic errors, great care must be dedicated to choose 
appropriate moves or sets of moves. Since molecular models often contain constraints, the con- 
struction of global moves can be demanding. Therefore, reasonable and efficient moves have to 
be chosen in correspondence to the model of a system to be simulated. 

A Monte Carlo update corresponds to the discrete "time step" Atq in the simulation process. In 
order to reduce correlations, typically a number of updates is performed between measurements 
of a quantity O. This series of updates is called a "sweep" and the "time" passed in a single 
sweep is Ar = A^Atq if the sweep consists of A^ updates. Thus, if M sweeps are performed, 
the discrete "time series" is expressed by the vector (0(rinit + Ar), 0{Tinn + 2Ar), . . . , 0(rinit + 
mAr), . . . , 0(rinit + MAr)) and represents the Monte Carlo trajectory. The period of equili- 
bration Tinit sets the starting point of the measurement. For convenience, we use the abbreviation 
Om = 0(rinit + mAr) and r^ = Ti^n + mAr with m = 1, 2, . . . , M in the following. 
According to the theory of ergodicity, averaging a quantity over an infinitely long time series is 
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identical to perform the statistical ensemble average: 
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(1) 



where VX represents the formal integral measure for the infinitesimal scan of the conformation 
space and p(X) is the energy dependent microstate probability of the conformation X in the 
relevant ensemble in thermodynamic equilibrium [in the canonical ensemble with temperature 
T, simply p(X) = exp[— _E(X)//cb7']]. This is the formal basis for Monte Carlo sampling. 
However, only finite time series can be simulated on a computer. For a finite number of sweeps 

M in a sample k, the relation ([U) can only be satisfied approximately, M~^ ^m=i ^"» ~ 

(i^\ (j^\ 

O ~ (O). Note that the mean value O will depend on the sample k, meaning that it is 

(jt') (fc) 

likely that another sample k' will yield a different value O 7^ O . In order to define a 

reasonable estimate for the statistical error, it is necessary to start from the assumption that we 

have generated an infinite number of independent samples k. In this case the distribution of the 

estimates O is Gaussian, according to the central limit theorem of uncorrelated samples. The 
exact average of the estimates is then given by (O). The statistical error of O is thus suitably 
defined as the standard deviation of the Gaussian: 
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is the autocorrelation function and ar. 
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{01) - (Om)' 

(0^„) — {Om)"^ is the variance of the distribution 
of individual data Om- If the Monte Carlo updates in each sample are performed completely 
randomly without memory, i.e., a new conformation is created independently of the one in the 
step before (which is a possible but typically very inefficient strategy), two measured values Om 
and On are uncorrelated, ifrriy^n. Then, the autocorrelation function simplifies to Amn = ^mn 
and the statistical error satisfies the celebrated relation 
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Since the exact distribution of Om values and the "true" expectation value (O) are unchanged in 
the simulation (but unfortunately unknown), the standard deviation ao^ is constant, too. Thus, 
the statistical error decreases with 1/-\/m11| 

In practice, most of the efficient Monte Carlo techniques generate correlated data, in which case 
we have to fall back to the more general formula ^. It can conveniently be rewritten as 
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'For the actual calculation, it is a problem that <Tq 



is unknown. However, what can be estimated is af^ 
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O^ — O and for its expected value we thus obtain ((Tq ) = <^o [i — ^/M). The 1/M correction is the systematic 
error due to the finiteness of the time series, called bias. The bias-corrected relation for the statistical error reads 

-1/2, 



finally £0 = [Af(M-l)]- 
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with the effective statistics Mgff = M/Ar^c < M, where Arac corresponds to the autocorrela- 
tion time. This means, the statistics is effectively reduced by the number of sweeps until the 
correlations have decayedo Since it takes at least the time Ar^c = N^cAto to generate statisti- 
cally independent conformations, a sweep can simply contain as many updates A^ac as necessary 
to satisfy At ~ Arac without losing effective statistics. In this case, the M ^ Men data enter- 
ing into the effective statistics are virtually uncorrected. This is also the general idea behind 
advanced, computationally convenient error estimation methods such as binning and jackknife 
analyses |l6ll3- For the correctness of the measurements, M ^ Mgff is not a necessary condition; 
more sweeps with less updates in each sweep, i.e., periods between measurements shorter than 
Arac only yield redundant statistical information. This is not even wrong, but computationally 
inefficient as it does not improve the statistical error ([5]). 

2.2 Master equation 

Beside ergodicity, another demand for correct statistical sampling is to ensure that the prob- 
ability distribution p(X.) associated to the desired statistical ensemble is independent of time. 
This can only be achieved in the simulation, if the relevant part of the phase space is sampled 
sufficiently efficient to allow for quick convergence towards a stable or, more precisely, station- 
ary estimate for p(X). In most of the Monte Carlo methods, the simulation follows a Markov 
dynamics, i.e., the update of a given conformation X to a new one X' is not influenced by the 
history that led to X, i.e., the dynamics does not possess an explicit memory. Such a Markov 
process can be described by the master equation: 

^^ = Vb(X')t(X' ^ X; Aro) - p(X)t(X ^ X'; Aro)], (6) 

Aro ^ 

where t(X — )• X'; Atq) is the transition probability from X to X' in a single update (or "time" 
step Aro). Due to particle conservation, it satisfies the normalization condition ^x' ^(^ — )• 
X'; Aro) = !» i-^-, whatever update we perform, we must end up with a state X' which is 
an element of the conformational space. The condition Ap{yi) / Atq = ensures that the 
ensemble is in a stationary state if the right-hand side of Eq. ^ vanishes. Since the stationarity 
condition also allows solutions where the distribution function p(X.) dynamically changes on 
cycles which, however, is not the physical situation in a statistical equilibrium ensemble, we 
demand more rigorously that the expression in the brackets vanishes. This is called the detailed 
balance condition. Consequently, the ratio of transition rates is given by 

t(X^X-;Aro) ^ p(XO 
tiX' ^ X; Aro) p(X) 

and thus independent of the length of "time" step Aro, which we, therefore, omit in the fol- 
lowing. From this relation, it follows that it is obviously a good idea to construct an efficient 
Markov chain Monte Carlo algorithm, i.e., to choose appropriate acceptance probabilities for 
the Monte Carlo updates to yield the correct transition probability t(X — t- X'), by taking into 
account the basic microstate probabilities of the statistical ensemble to be simulated. Markov 
Monte Carlo simulations in the canonical ensemble at fixed temperature T, for example, have 



^For a detailed discussion of the autocorrelation function and the calculation of the autocorrelation time, see, 
e.g.,Ref. |6|. 
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to satisfy 



t(X'^X) ' ^^ 



where AE = -E'(X') — -E'(X) is the energy difference between the new and the old state. Thus, 
the transition rate to reach a state X', supposed to be energetically favored if compared with 
the initial state X, grows exponentially with AE < 0. "Climbing the hill" towards a state with 
higher energy {AE > 0) is, on the other hand, exponentially suppressed. This is in correspon- 
dence with the interpretation of the Markov transition state theory. Hence, it is possible to study 
the kinetic behavior (identification of free-energy barriers, measuring the height of barriers, es- 
timating transition rates, etc.) of a series of processes in equilibrium - for example the folding 
and unfolding behavior of a protein - by means of Monte Carlo simulations. To quantify the 
dynamics of a process, i.e., the explicit time dependence is, however, less meaningful as the 
conformational change in a single time step depends on the move set and does not follow a 
physical, e.g., Newtonian, dynamicsjj 

2.3 Selection and acceptance probabilities 

In order to correctly satisfy the detailed balance condition (|7]) in a Monte Carlo simulation, we 
have to take into account that each Monte Carlo step consists of two parts. First, a Monte Carlo 
update of the current state is suggested and second, it has to be decided whether or not to accept 
it according to the chosen sampling strategy. In fact, both steps are independent of each other in 
the sense that each possible update can be combined with any sampling method. Therefore, it is 
useful to factorize the transition probability t(X — )• X') in the selection probability s(X — )• X') 
for a desired update from X to X' and the acceptance probability a(X — )• X') for this update: 

t(X -^ X') = s(X -^ X')a(X -^ X'). (9) 

The acceptance probability is typically used in the form 

a(X -^ X') = min (1, a(X, X')u;(X ^ X')) , (10) 

with the ratio of microstate probabilities 

and the ratio of forward and backward selection probabilities 

^ ' ^ s(X^X') ^ ^ 

The expression (flOl) for the acceptance probability naturally fulfills the detailed-balance con- 
dition (|7]). The selection ratio cr(X, X') is unity, if the forward and backward selection proba- 
bilities are identical. This is typically the case for "simple" local Monte Carlo updates. If, for 



^The natural way to study the time dependence of Newtonian mechanics is typically based on molecular dy- 
namics methods which, however, suffer from severe problems to ensure the correct statistical sampling at finite 
temperatures by using thermostats [2','3l. From a more formal point of view, it is even questionable what "dynam- 
ics" shall mean in a thermal system, where even under the same thermodynamic conditions trajectories run typi- 
cally differently, due to the "random" thermal fluctuations caused by interactions with the huge number [C(10^^) 
per mol] of realistically not traceable heat bath particles. 
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example, the update is a translation of a coordinate, a;' = x + Ax, where Ax G [—Xq, +Xo] 
is chosen from a uniform random distribution, the forward selection for a translation by Ax 
is equally probable to the backward move, i.e., to translate the particle by —Ax. This is also 
valid for rotations about bonds in a molecular system such as rotations about dihedral angles in 
a protein. If selection probabilities for forward and backward moves differ, the selection rate 
is not unity. This is often the case in complex, global updates which comprise several steps. 
Then, the determination of the correct selection probabilities can be difficult and the selection 
rate has typically to be estimated in test runs first. To this class of updates belong the biased 
Gaussian steps [[8]|, where a series of torsional updates of a few sequential protein backbone 
dihedral angles are performed in order to ensure that the update does not drastically change the 
protein conformation (which would likely be rejected). 

Note that the overall efficiency of a Monte Carlo simulation depends on both, a model-specific 
choice of a suitable set of moves and an efficient microstate sampling strategy based on w(X — )■ 
X'). 



2.4 Simple sampling 

The choice of the microstate probabilities p(X.) is not necessarily coupled to a certain physical 
statistical ensemble. Thus, the simplest choice is a uniform probability p(X) = 1 independently 
of ensemble- specific microstate properties. Thus also ^(X — )• X') = 1 and if the Monte 
Carlo updates satisfy a(X., X') = 1, the acceptance probability is trivially also unity, a(X — )■ 
X') = 1, i.e., all generated Monte Carlo updates are accepted, independently of the type of 
the update. Thus, updates of system degrees of freedom can be performed randomly, where the 
random numbers are chosen from a uniform distribution. This method is called simple sampling. 
However, its applicability is quite limited. Consider, for example, the estimation of the density 
of states for a discrete system with this method. After having performed a series of M updates, 
we will have obtained an energetic histogram h{E) = M^^ X]m=i ^Em,E which represents 
an estimate for the density of states. The canonical expectation value of the energy can be 
estimated by ^ = M~^Y.Z=iE^^~^'"'^''^ = Ee^^I^)^"^^''^^- If the microstates are 
generated randomly from a uniform distribution, it is obvious that we will sample the states X 
with an energy -E(X) in accordance with their system-specific frequency or degeneracy. High- 
frequency states thermodynamically dominate in the purely disordered phase. However, near 
phase transitions towards more ordered phases, the density of states drops rapidly - typically by 
many orders of magnitude. The degeneracies of the lowest-energy states representing the most 
ordered states are so small that the thermodynamically most interesting transition region spans 
even in rather small systems often hundreds to thousands orders of magnitudeu 
To bridge a region of 100 orders of magnitude between an ordered and a disordered phase by 
simple sampling would roughly mean to perform about 10'"° updates in order to find a single 
ordered state. Assuming that a simple single update would require only a few CPU operations, 
it will at least take 1 ns on standard CPU cores. Even under this optimistic assumption, it 
would take more than 10^^ years to perform 10'°" updates on a single core! Thus, for studies of 



"^In order to get an impression of the large numbers consider the 2D Ising model of locally interacting spins on 
a square lattice which can only be oriented parallel or antiparallel. For a system with 50 x 50 = 2500 spins, the 
total number of spin configurations is thus 2^^"° ^ 10^^^. The degeneracy of the maximally disordered energetic, 
paramagnetic is of the same order of magnitude. Since the ferromagnetic ground-state degeneracy is 2 (all spins 
up or all down), i.e., it is of the order of lO", the density of states of this rather small system covers far more than 
700 orders of magnitude. 
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complex systems with sufficiently many degrees of freedom allowing for cooperativity, simple 
sampling is of very little use. 

2.5 Metropolis sampling 

Because of the dominance of a certain restricted space of microstates in ordered phases, it is 
obviously a good idea to primarily concentrate in a simulation on a precise sampling of the 
microstates that form the macrostate under given external parameters such as, for example, 
the temperature. The canonical probability distribution functions clearly show that within the 
certain stable phases, only a limited energetic space of microstates is noticeably populated, 
whereas the probability densities drop off rapidly in the tails. Thus, an efficient sampling of 
this state space should yield the relevant information within comparatively short Markov chain 
Monte Carlo runs. This strategy is called importance sampling. 

The standard importance sampling variant is the Metropolis method [9], where the algorith- 
mic microstate probability J9(X) is identified with the canonical microstate probability p(X) ~ 
g-/3£;(x) ^j- j-j^g given temperature T ((3 = 1/k^T). Thus, the acceptance probability (flOl) is 
governed by the ratio of the canonical thermal weights of the microstates: 

^(X^X') = e-''[^(^')-^(^)l. (13) 

According to Eq. (flOl) . a Monte Carlo update from X to X' (assuming (t(X, X') = 1) is ac- 
cepted, if the energy of the new microstate is lower than before, -E'(X') < -E'(X). If this update 
would provoke an increase of energy, -E'(X') > -E'(X), the conformational change is accepted 
only with the probability e^^^^ , where /\E = -E'(X') — E{X.). Technically, in the simula- 
tion, a random number r G [0, 1) from a uniform distribution is drawn: If r < e^'^^^, the 
move is still accepted, whereas it is rejected otherwise. Thus, the acceptance probability is 
exponentially suppressed with AE and the Metropolis simulation yields, at least in principle, 
a time series which is inherently correctly sampled in accordance with the canonical statistics. 
The arithmetic mean value of a quantity O over the finite Metropolis time series is already an 
estimate for the canonical expectation value: O = M^^ J2m=i ^m ~ {O). In the hypotheti- 
cal case of an infinitely long simulation (M — ;■ oo), this relation is an exact equality, i.e., the 
deviation is due to the finiteness of the time series only. However, it is just this restriction to 
a finite amount of data which limits the quality of Metropolis data. Because of the canonical 
sampling, reasonable statistics is only obtained in the energetic region which is most dominant 
for a given temperature, whereas in the tails of the canonical distributions the statistics is rather 
poor. Thus, there are three physically particularly interesting cases where Metropolis sampling 
as standalone method is little efficient. 

First, for low temperatures, where lowest-energy states dominate, the widths of the canonical 
distributions are extremely small and since (3 ~ 1/T is very large, energetic "uphill" updates 
are strongly suppressed by the Boltzmann weight e~^^^ — )■ 0. That means, once caught in a 
low-energy state, the simulation freezes and it remains trapped in a low-energy state for a long 
period. 

Second, near a second-order phase transition, the standard deviation cte = \/(-E'^) — {E)"^ 
of the canonical energy distribution function gets very large at the critical temperature T^, as 
it corresponds to the maximum (or, in the thermodynamic limit, the divergence) of the spe- 
cific Cv = cr's/k^T'^- Thus, a large energetic space must precisely be sampled ("critical 
fluctuations") which requires high statistics. Since in Metropolis dynamics, "uphill moves" 
with AE > are only accepted with a reasonable rate, if at the transition point the ratio 
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AE/k^Tc > is not too large, it can take a long time to reach a high-energy state if start- 
ing from the low-energy end. Since near Tc the correlation length diverges like ^ ~ kl '^ 
[with T = (T — Tc)/Tc] and the correlation time in the Monte Carlo dynamics behaves like 
tcon ~ kl^^^j the dynamic exponent z allows to compare the efficiencies of different algo- 
rithms. The larger the value of z, the less efficient is the method. Unfortunately, the standard 
Metropolis method turns out to be one of the least efficient methods in sampling critical prop- 
erties of systems exhibiting a second-order phase transition. 

The third reason is that the Metropolis method does also perform poorly at first-order phase 
transitions. In this case, the canonical distribution function is bimodal, i.e., it exhibits two 
separate peaks with a highly suppressed energetic region in-between, since two phases coexist. 
For the reasons already outlined, it is extremely unlikely to succeed if trying to "jump" from 
the low- to the high-energy phase by means of Metropolis sampling; it rather would have to 
explore the valley step by step. Since the energetic region between the phases is entropically 
suppressed - the number of possible states the system can assume is simply too small - it is 
thus quite unlikely that this "diffusion process" will lead the system into the high-energy phase, 
or it will at least take extremely long. 

However, apart from lowest-energy and phase transition regions, the Metropolis method can 
successfully be employed, often in combination with reweighting techniques. 

3 Reweighting methods 

3.1 Single-histogram reweighting 

A standard Metropolis simulation is performed at a given temperature, say Tq. However, it 
is often desirable to get also quantitative information about the changes of the thermodynamic 
behavior at nearby temperatures. Since Metropolis sampling is not a priori restricted to a limited 
phase space, at least in principle, it is indeed theoretically possible to reweight Metropolis data 
obtained for a given temperature Tq = 1/A;b/3o to a different one, T = 1//i;b/3. The idea is 
to "divide out" the Boltzmann factor e"^"^ in the estimates for any quantity at the simulation 
temperature and to multiply it by e^'^^: 

(0)t = - — ^Ot= ^"^=^ ^"^ (14) 

where we have again considered that the MC time series of length M is finite. In practice, 
the applicability of this simple reweighting method is rather limited in case the data series was 
generated in a single Metropolis run, since the error in the tails of the simulated canonical 
histograms rapidly increases with the distance from the peak. By reweighting, one of the noisy 
tails will gain the more statistical weight the larger the difference between the temperatures 
To and T is. In combination with the generalized-ensemble methods to be discussed later in 
this chapter, however, single-histogram reweighting is the only way of extracting the canonical 
statistics off the simulated histograms and works perfectly. 

3.2 Multiple-histogram reweighting 

From each Metropolis run, an estimate for the density of states g{E) can easily be calculated. 
Since the histogram measured in a simulation at temperature T, h{E; T) = X]m=i ^EEm^ is an 
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estimate for the canonical distribution function Pcan{E; T) ~ g{E)e~^^, the estimate for the 
density of states is obtained by reweighting, g{E) = h{E; T)e^^ . However, since in a "real" 
Metropolis run at the single temperature T accurate data can only be obtained in a certain 
energy interval which depends on T, the estimate 'g{E) is restricted to this typically rather 
narrow energy interval and does by far not cover the whole energetic region reasonably well. 
Thus, the question is whether the combination of Metropolis data obtained in simulations 
at different temperatures, can yield an improved estimate 'g{E). This is indeed possible by 
means of the multiple-histogram reweighting method ifTOl . sometimes also called "weighted 
histogram analysis method" (WHAM) [[TT|. Even though the general idea is simple, the actual 
implementation is not trivial. The reason is that conventional Monte Carlo simulation tech- 
niques such as the Metropolis method cannot yield absolute estimates for the partition sum 
^(^) = 'I2e9(^)^~'^^^ ^•^•' estimates for the density of states at different energies gi{E) and 
gj{E') can only be related to each other if obtained in the same run, i.e., i = j, but not if per- 
formed under different conditions. This is not a problem for the estimation of mean values or 
normalized distribution functions at fixed temperatures as long as the Metropolis data obtained 
in the respective temperature threads are used, but interpolation to temperatures where no data 
were explicitly generated, is virtually impossible. Also the multiple-histogram reweighting 
method does not solve the problem of getting absolute quantities, but at least a "reference par- 
tition function" is introduced, which the estimates of the density of states obtained in runs at 
different simulation temperatures can be related to. Thus, interpolating the data between differ- 
ent temperatures becomes possible. 

Basically, the idea is to perform a weighted average of the histograms hi{E), measured in 
Monte Carlo simulations for different temperatures, i.e., at /3j (where i = 1,2,...,/ indexes 
the simulation thread), in order to obtain an estimator for the density of states by combining the 
histograms in an optimal way: 

9[E) - — ^^ T—^ — . (15) 

The exact density of states is given by g{E) = pcaniE;T)Z{T)e^^ and since the normalized 
histogram hi{E)/Mi obtained in the ith simulation thread is an estimator for the canonical 
distribution function Pcan(-E'; Ti), the density of states is in this thread estimated by 

gm = ^Z^e^^^ (16) 

where Zi is the unknown partition function at the iih. temperature. Since in Metropolis simu- 
lations the best-sampled energy region depends on the simulation temperature, the number of 
histogram entries for a given energy will differ from thread to thread. Thus, the data of the 
thread with high statistics at E should in this interpolation scheme get more weight than his- 
tograms with less entries at E. Therefore, the weight shall be controlled by the errors of the 
individual histograms. A possibility to determine a set of optimal weights is to reduce the devi- 
ation of the estimate g{E) for the density of states from the unknown exact distribution {g) (E), 
where the symbol (...) is used to refer to this quantity as the true distribution which would 
have been hypothetically obtained in an infinite number of threads (it should not be confused 
with a statistical ensemble average). As usual, the "best" estimate is the one that minimizes 
the variance a"^ = {{g — {g)Y). Inserting the relation (fT5l) and minimizing with respect to the 
weights Wi yields a solution 

W^ = 4' (17) 
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where a^. = {(gi — {gi)Y) is the exact variance of gfj in the ith thread. Because of Eq. (fT6l) and 
the fact that Zi is an energy-independent constant in the ith thread, we can now concentrate on 
the discussion of the error of the 2th histogram, since o"|. = a'^.Zfe^'^^^/Mf. 
The variance cr^ is also an unknown quantity and, in principle, an estimator for this variance 
would be needed. This would yield an expression that includes the autocorrelation time fllOl 
[Tn - similar to the discussion below Eq. ([5]). However, to correctly keep track of the correla- 
tions in histogram reweighting is difficult and thus also the estimation of error propagation is 
nontrivial. Therefore, we follow the standard argument based on the assumption of uncorre- 
cted Monte Carlo dynamics (which is typically not perfectly true, of course). The consequence 
of this idealization will be that the weights (fTTI) are not necessarily optimal anymore (the ap- 
plicability of the method itself is not dependent of the choice of Wi, but the error of the final 
histogram will depend on the weights). 

In order to determine af^. for uncorrected data, we only need to calculate the probability P{hi) 
that in the zth thread a state with energy E (for simplicity we assume that the problem is discrete) 
is hit hi times in Mj trials, where each hit occurs with the probability phit- This leads to the 
binomial distribution with the hit average {hi) = MjPhit- In the limit of small hit probabilities (a 
reasonable assumption in general if the number of energy bins is large, and, in particular, for the 
tails of the histogram) the binomial turns into the Poisson distribution P{hi) — )• {hi)'^^e~^'^''^ /hi\ 
with identical variance and expectation value, a^, = (hi). Insertion into Eq. (flTl) yields the 
weights 

Since (hi) (E) is exact, the exact density of states can also be written as 

9iE) = IMMz.e'^^^ (19) 

which is valid for all threads, i.e., the left-hand side is independent of i. This enables us to 
replace (hi) everywhere. Inserting expression (fTSi) into Eq. (fT5l) and utilizing the relation (fT9l) 
to replace (hi), we finally end up with the estimator for the density of states in the form 

g^E) = ^'=i^-(^) , (20) 

where the unknown partition sum is given by 

Zi = J29{E)e-^^^, (21) 

E 

i.e., the set of equations (|20|) and ((2T)) must be solved iterativelyo One initializes the recursion 
with guessed values Z- for all threads, calculates the first estimate g'^^^E) using Z- , re- 
inserts this into Eq. (|2TI) to obtain Z^ ', and continues until the recursion process has converged 
close enough to a fixed point. 

There is a technical aspect that should be taken into account in an actual calculation. Since 
the density of states can even for small systems cover many orders of magnitude and also the 
Boltzmann factor can become very large, the application of the recursion relations (|20|) and ((2T)) 



^Note that for a system with continuous energy space which is partitioned into bins of width AE in the simu- 
lation, the right-hand side of Eq. (1211 1 must still be multiplied by AE. 
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often results in overflow errors since the floating-point data types cannot handle these numbers. 
At this point, it is helpful to change to a logarithmic representation which however, makes it 
necessary to think about adding up large numbers in logarithmic form. Consider the special 
but important case of two positive real numbers a > and < 6 < a which are too large 
to be stored such that we wish to use their logarithmic representations aiog = log a and 6iog = 
log b instead. Since the result of the addition, c = a + b, will also be too large, we introduce 
Clog = logc as well. The summation is then performed by writing c = e^^°^ = e"'°8 + e^'°s. 
Since a > b (and thus also aiog > 6iog), it is useful to separate a, and to rewrite the sum 
as e'^'°i^ = e"'°s(l + 6*''°*="'^'°*=). Taking the logarithms yields the desired result, where only the 
logarithmic representations are needed to perform the summation: ciog = aiog+log(l+x), where 
X = b/a = e^'°!5~"'°!5 G [0, 1]. The upper limit x = 1 is obviously associated to a = 6, whereas 
the lower limit x = matters if a > 0, b = OU Since the logarithm of the density of states is 
proportional to the microcanonical entropy, S{E) ~ log g{E), the logarithmic representation 
has even an important physical meaning. 



4 Generalized-ensemble Monte Carlo methods 

The Metropolis method is the simplest importance sampling Monte Carlo method and for this 
reason it is a good starting point for the simulation of a complex system. However, it is also 
one of the least efficient methods and thus one will often have to face the question of how to 
improve the efficiency of the sampling. One of the most frequently used "tricks" is to employ 
a modified statistical ensemble within the simulation run and to reweight the obtained statistics 
after the simulation. The simulation is performed in an artificial generalized ensemble. 



4.1 Replica-exchange Monte Carlo method (parallel tempering) 

Although not being most efficient, parallel tempering is the most popular generalized-ensemble 
method. Advantages are the simple implementation and parallelization on computer systems 
with many processor cores. The Metropolis method samples conformations of the system in 
a single canonical ensemble at a fixed temperature, whereas replica-exchange methods like 
parallel tempering simulate / ensembles at temperatures Ti, T2, . . . , T/ in parallel (and thus / 
replicas or instances of the system) [fT2l - [T4l . In each of the / temperature threads, standard 
Metropolis simulations are performed. A decrease of the autocorrelation time, i.e., an increase 
in efficiency, is achieved by exchanging replicas in neighboring temperature threads after a 
certain number of Metropolis steps are performed independently in the individual threads. The 
acceptance probability for the exchange of the current conformation X at temperature Tj = 
1/^bA and the conformation X' at [5j is given by 

a(X O X'; A,/3,) = min(l,exp{-(A - /3,)[i5(X') - E{X)]}), (22) 



^At the lower limit, there is a numerical problem, if &iog — aiog ^ (or a; = h/a ^ 1) is so small that the 
minimum allowed floating-point number is underflown by x. This typically occurs if a and 6 differ by many tens 
to thousands orders of magnitude (depending on the floating-point number precision). In this case, the difference 
between c and a cannot be resolved, as the error in ciog = aiog + 0{x) is smaller than the numerical resolution; in 
which case we simply set ciog — aiog. If this is not acceptable and a higher resolution is really needed, non-standard 
concepts of handling numbers with arbitrary precision could be an alternative. 
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which satisfies the detailed balance condition in this generalized ensemblefl Since the tem- 
perature of each thread is fixed, only a small section of the density of states can be sampled 
in each thread because of the Metropolis limitations. In order to obtain an entire estimate of 
the density of states, the pieces obtained in the different threads must be combined in an opti- 
mal way. This is achieved by subsequent multiple-histogram reweighting. The main advantage 
of parallel tempering is its high parallelizability. However, the most efficient selection of the 
temperature set can be a highly sophisticated task. One necessary condition for reasonable 
exchange probabilities is a sufficiently large overlap of the canonical energy distribution func- 
tions in neighboring ensembles. At very low temperatures, the energy distribution is typically 
a sharp-peaked function. Thus, the density of temperatures must be much higher in the regime 
of an ordered phase, compared with high-temperature disordered phases. For this reason, the 
application of the replica-exchange method is often not particularly useful for unraveling the 
system behavior at very low temperatures or near first-order transitions. 

4.2 Multicanonical sampling 



The powerful multicanonical method [IT5UT7II makes it possible to scan the whole phase space 
within a single simulation with very high accuracy [18], even if first-order transitions occur. The 
principle idea is to deform the Boltzmann energy distribution pcan{E] T) oc g{E) exp{—(3E) in 
such a way that the notoriously difficult sampling of the tails is increased and - particularly 
useful - the sampling rate of the entropically strongly suppressed lowest-energy conformations 
is improved. In order to achieve this, the canonical Boltzmann distribution is modified by the 
multicanonical weight VTmucal-^'; T) which, in the ideal case, flattens the energy distribution: 

W^muca(^; T)p,,^{E; T) ~ /ln,uca(^) = COnSt^;.^, (23) 

where h^aca{E) denotes the (ideally flat) multicanonical histogram. By this construction, the 
multicanonical simulation performs a random walk in energy space which rapidly decreases 
the autocorrelation time in entropically suppressed regions. This is particularly apparent and 
important in the phase separation regime at first-order-like transitions, as it is schematically 
illustrated in Fig. [B 

Recalling that the simulation temperature T does not possess any meaning in the multicanonical 
ensemble as, according to Eq. (|23l) . the energy distribution is always constant, independently of 
temperature. Actually, it is convenient to set it to infinity in which case Yvuvt^oo Pca.n{E] T) ~ 
g{E) and thus YimT^oo W^^ca.{E] T) ~ g'^{E). Then, the acceptance probability (fTOl) is gov- 
erned by 

^(X ^ X') = W^^u,,(E(X'))/W^„,uca(^(X)) = g{EOq)lg{E{X!)). (24) 

The weight function can suitably be parametrized as 

W^uca(^) ~ eM-S{E)/k^] = exp{-/3(E)[E - F{E)]}, (25) 

where S{E) is the microcanonical entropy S{E) = kB\ng{E). Since (3{E) = dS{E)/dE is 
the microcanonical thermal energy (with /3{E) = l/k^T{E), where T{E) is the microcanonical 



^In the generalized ensemble composed of two canonical ensembles at temperatures Ti and Tj, the probability 
for a state X at T, and a state X' at T, reads p(X, X'; T^, T,) - exp{-[/3,£'(X) + /3j£'(X')]}. 
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Fig. 1: Typical scenario of a first-order transition at transition temperature T,,-: Ordered and 
disordered phases, represented by the peaked sections of the canonical energy distribution 
Pcan{E;Ttr) at low and high energies, are separated by an entropically strongly suppressed 
energetic region. The multicanonical weight function Wmuca{E\Ttr) is chosen in such a way 
that multicanonical sampling provides a random walk in energy space, independently of (en- 
ergetic) free-energy barriers. Thus, the energy distribution hmuca{E) is ideally constant in the 
multicanonical ensemble. 



temperature), the microcanonical free-energy scale f{E) = /3(E) F(E) and /3(E) are related to 
each other by the differential equation 



df(E) d^(E) 



dE 



dE 



E. 



(26) 



Since /3(E) and f(E) are unknown in the beginning of the simulation, this relation must be 
solved recursively. This can be done in an efficient way [|16[|17lfT9l . If not already being discrete 
by the model definition, the energy spectrum must be discretized, i.e., neighboring energy bins 
are separated by an energetic step size e. Thus, for the estimation of /3(E) and f(E), the 
following system of difference equations needs to be solved recursively {s(E) = S(E)/kB)'- 
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(27) 



The superscript (n) refers to the index of the iteration. If no better initial guess is available, one 
typically sets g^^\E) = 1 in the beginning, implying s^^\E) = Wmuca{E) = 0. The zeroth 
iteration thus corresponds to a Metropolis run at infinite temperature, yielding the first estimate 
for the multicanonical weight function WmucaiE), which is used to initiate the second recursion, 
etc. The recursion procedure based on Eq. (|27l) can be stopped after / recursions, if the weight 
function has sufficiently converged. The number of necessary recursions and also the number 



Monte Carlo Simulations A 10. 15 

of sweeps to be performed within each recursion is model dependent. Since the sampled energy 
space increases from recursion to recursion and the effective statistics of the histogram in each 
energy bin depends on the number of sweeps, it is a good idea to increase the number of sweeps 
successively from recursion to recursion. Since the energy histogram should be "flat" after the 
simulation run at a certain recursion level, an alternative way to control the length of the run is 
based on a flatness criterion. If, for example, minimum and maximum value of the histogram 
deviate from the mean histogram value by less than 20%, the run is stopped. 

Finally, after the best possible estimate for the multicanonical weight function is obtained, a 
long multicanonical production run is performed, including all measurements of quantities of 
interest. From the multicanonical trajectory, the estimate of the canonical expectation value of 
a quantity O is then obtained at any (canonical) temperature T by: 

Since the accuracy of multicanonical sampling is independent of the canonical temperature and 
represents a random walk in the entire energy space, the application of reweighting procedures 
is lossless. This is a great advantage of the multicanonical method, compared with Metropolis 
Monte Carlo simulations. Virtually, a multicanonical simulation samples the system behavior at 
all temperatures simultaneously, or, in other words, the direct estimation of the density of states 
is another advantage, because multiple-histogram reweighting is not needed for this (in contrast 
to replica-exchange methods). 



4.3 Wang-Landau method 

In multicanonical simulations, the weight functions are updated after each iteration, i.e., the 
weight and thus the current estimate of the density of states are kept constant at a given recursion 
level. For this reason, the precise estimation of the multicanonical weights in combination with 
the recursion scheme (|27|) can be a complex and not very efficient procedure. In the method 
introduced by Wang and Landau EOl . the density of states estimate is changed by a so-called 
modification factor a after each sweep, g{E) — )■ a^"^g{E), where a^^^ > 1 is kept constant 
in the nth recursion, but it is reduced from iteration to iteration. A frequently used ad hoc 
modification factor is given by a*^"^ = yW^"^ = (a(°))^/^", n = 1,2,...,/, where often 
Q,(o) — gi — 2.718 ... is chosen. The acceptance probability and histogram flatness criteria are 
the same as in multicanonical sampling. 

Since the dynamic modification of the density of states in the running simulation violates the 
detailed balance condition ([7]), the advantage of the high-speed scan of the energy space is paid 
by a systematic error. However, since the modification factor is reduced with increasing iteration 
level until it is very small (the iteration process is typically stopped if a < 1.0 -|- 10~^), the 
simulation dynamics is supposed to sample the phase space according to the stationary solution 
of the master equation such that detailed balance is (almost) satisfied. Since it is difficult to 
keep this convergence under control, the optimal method is to use the Wang-Landau method 
for a very efficient generation of the multicanonical weights, followed by a long multicanonical 
production run (i.e., at exactly a = 1) to obtain the statistical data. 
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5 Summary 

Monte Carlo computer simulations are virtually the only way to analyze the thermodynamic 
behavior of a system in a precise way. However, the various existing methods exhibit extreme 
differences in their efficiency, depending on model details and relevant questions. The original 
standard method. Metropolis Monte Carlo, which provides only reliable statistical information 
at a given (not too low) temperature has meanwhile been replaced by more sophisticated meth- 
ods which are typically far more efficient (the differences in time scales can be compared with 
the age of the universe). However, none of the methods yields automatically accurate results, 
i.e., a system-specific adaptation and control is always needed. Thus, as in any good experiment, 
the most important part of the data analysis is statistical error estimation. 
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